An Improved Approach for Caption Based Image Web Crawler

نویسندگان

  • Dhiraj Khurana
  • Satish Kumar
چکیده

The World Wide Web [1] is a global, read-write information space. Text documents, images, multimedia and many other items of information, referred to as resources, are identified by short, unique, global identifiers called Uniform Resource Identifiers so that each can be found, accessed and cross referenced in the simplest possible way. It is a vast reservoir of information provides an unrestricted access to large inexhaustible pool of information, present in the form of hypertext documents formatted using Hyper Text Markup Language (HTML). These documents contain hyperlinks to other documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Caption Crawler: Enabling Reusable Alternative Text Descriptions using Reverse Image Search

Accessing images online is often difficult for users with vision impairments. This population relies on text descriptions of images that vary based on website authors’ accessibility practices. Where one author might provide a descriptive caption for an image, another might provide no caption for the same image, leading to inconsistent experiences. In this work, we present the Caption Crawler sy...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Marie-4: A High-Recall, Self-Improving Web Crawler That Finds Images Using Captions

page text describes associated images, and images are not captioned consistently. Content-based image retrieval systems that analyze the images themselves1 are progressing, but the systems require considerable image-preprocessing time. Furthermore, surveys of users doing image retrieval show that users are more interested in the identification of objects and actions depicted by images than in t...

متن کامل

The Use of Object Labels and Spatial Prepositions as Keywords in a Web-Retrieval-Based Image Caption Generation System

In this paper, a retrieval-based caption generation system that searches the web for suitable image descriptions is studied. Google’s search-by-image is used to find potentially relevant web multimedia content for query images. Sentences are extracted from web pages and the likelihood of the descriptions is computed to select one sentence from the retrieved text documents. The search mechanism ...

متن کامل

An Improved Pixon-Based Approach for Image Segmentation

An improved pixon-based method is proposed in this paper for image segmentation. In thisapproach, a wavelet thresholding technique is initially applied on the image to reduce noise and toslightly smooth the image. This technique causes an image not to be oversegmented when the pixonbasedmethod is used. Indeed, the wavelet thresholding, as a pre-processing step, eliminates theunnecessary details...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012